CST Bank: A Corpus for the Study of Cross-document Structural Relationships
نویسندگان
چکیده
Clusters of multiple news stories related to the same topic exhibit a number of interesting properties. For example, when documents have been published at various points in time or by different authors or news agencies, one finds many instances of paraphrasing, information overlap and even contradiction. The current paper presents the Cross-document Structure Theory (CST) Bank, a collection of multi-document clusters in which pairs of sentences from different documents have been annotated for cross-document structure theory relationships. We will describe how we built the corpus, including our method for reducing the number of sentence pairs to be annotated by our hired judges, using lexical similarity measures. Finally, we will describe how CST and the CST Bank can be applied to different research areas such as multi-document summarization.
منابع مشابه
Cross-document relationship classification for text summarization
Multiple documents describing the same event present some interesting challenges for natural language processing. They contain similar information and yet they also exhibit a number of interesting properties: paraphrases, partial agreement, difference in judgment and emphasis, and contradictions. When the sources track an event that evolves over time, more phenomena can be observed: additions, ...
متن کاملCombining Labeled and Unlabeled Data for Learning Cross-Document Structural Relationships
Multi-document discourse analysis has emerged with the potential of improving various NLP applications. Based on the newly proposed Cross-document Structure Theory (CST), this paper describes an empirical study that classifies CST relationships between sentence pairs extracted from topically related documents, exploiting both labeled and unlabeled data. We investigate a binary classifier for de...
متن کاملA Common Theory of Information Fusion from Multiple Text Sources Step One: Cross-Document Structure
We introduce CST (cross-document structure theory), a paradigm for multidocument analysis. CST takes into account the rhetorical structure of clusters of related textual documents. We present a taxonomy of cross-document relationships. We argue that CST can be the basis for multidocument summarization guided by user preferences for summary length, information provenance, cross-source agreement,...
متن کاملProviding a Model for Evaluating Suspicious Bank Accounts with the Approach of Determining Tax Effects Based on Structural Equation Modeling
The main approach of this study is to provide solutions to managers, economists, and tax auditors. To have a clearer perspective of the transactional relationships that distress the taxpayer transaction tax also help them to choose the best strategy to improve tax revenue. In this paper the fuzzy Delphi method was used to identify the indicators affecting suspicious bank accounts. The data coll...
متن کاملبررسی میزان اعتبار آزمون جملات رقابتی در بیماران مبتلا به سکته مغزی، بیمارستان لقمان، 79-1378
Background: Cerebrovasular diseases (CVD) are one of the most common anomalies which may affect on auditory cortex. In this research we have tried to evaluate the function of CANS in a group of 50-70 years old cerebrovascular accident (CVA) patients without hearing problem by using Persian version of C.S.T. Materials and Methods: This cross-sectional analytic study was established at Loghman-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004